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ABSTRACT 

A study examined middle grades students' learning of 
concepts related to the use and interpretation of graphs. Subjects of 
the study were 76 sixth-grade students in 3 different mathematics 
classes in a central North Carolina middle school. The first two 
parts of the written instrument were administered as both a pretest 
and a posttest, using line plots and bar graphs; the second two 
parts, using stem plots and histograms, were administered only as a 
posttest, since few students have had experience with these graphs. 
For each question on the tests, the analysis involved categorizing 
responses in ways that characterized the nature of students' 
thinking. Results indicated that students: (1) confuse the axes of 
line plot and histogram type graphs; (2) have problems using 
intervals of data; (3) use the "middle" of the data to describe what 
is typical much less frequently than the mode; and (A) seem to find 
the measures of center, mean and median, not readily identifiable 
from the graph. Findings revealed that the manner in which questions 
were posed could influence the categorization of student responses, 
as could the visual features of the graphs, and that the students' 
interpretation of the word typical may not be viewed as intended. 
Further research is suggested to look at both visual and wording 
effects more systematical ly. (includes 6 figures of data and 26 
references . ) (CR) 
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Building a Theory of Graphicacy: How Do Students Read Graphs? 

Susan N. Friel, UNC-Chapel Hill 
George W. Bright, UNC-Greensboro 



Introduction 

This study examined middle grades students’ learning of concepts related to the use and 
interpretation of graphs. We view graphs as part of the process of statistical investigation. A 
^ statistical investigation typically involves four components: pose the question, collect the data, 

^ analyze the data, and interpret the results, in some order (Graham, 1987). The use of graphs is 

Q linked to the “analyze the data” component of the statistical investigation process. Considering 

^ what it means to understand and use graphical representations is a part of what it means to know 



and be able to do statistics. 

Graphicacy 

The term "graphicacy" first appeared in 1965 in an article in The Times Education Supplement 
by Baichin and Coleman (Boardman, 1983); it was used as a way to broadly describe the use of 
siich visualizations as a house plan, farm layout, map of a village, route through a town, sketch of 
a landform, or photograph of a landscape to communicate information about spatial relationships. 
Over time its definition has narrowed; Wainer (1980) introduced the term "graphicacy" to mean the 
ability to read graphs, defining it as proficiency in understanding quantitative phenomena that are 



presented in a graphical way. 

Exactly what constitutes a "graph" has been the subject of various papers (e.g., Bertin, 1980; 
Doblin, 1980; Fry, 1983; Twyman, 1980). Fry's (1983) definition of a graph is generic; that is, "a 
graph is information transmitted by position of point, line or area on a two-dimensional surface 
(p. 5), including all spatial designs and excluding displays that incorporate the use of symbols such 
as words and numerals (e.g., tables). Guthrie, Weber, and Kimmerly (1993) include graphs with 



diagrams, charts, tables, directions, instructions, illustrations, lists, maps, schematics, drawings, 
blueprints, and forms in characterizing documents as "symbolic displays that do not consist 
primarily of written prose" (p. 187). Wainer (1992), on the other hand, characterized graphs in a 
way that includes statistical graphs that are used to convey information in a variety of fields and 
excludes many of the other kinds of visualizations earlier authors have included. 

For the study reported here, we limited consideration of graphs to standard graphs and plots of 
univariate data that dominate the school curriculum, that is, line plots, bar graphs, stem "d-leaf 
plots (stem plot), and histograms. Other categories of graphs focus on representing bivaiiate data, 
including scatter plots or line graphs (e.g., Berg, 1992; Mokros, 1985, 1986; Padilla, McKenzie, 
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Perspective 

In this section, attention is given to identification of factors that influence interpretation of data 
presented in graphs (Bright & Friel, Forthcoming; Friel & Bright, 1995-a). We discuss the nature 
of data reduction and structure of graphs and the nature of the questions asked as they relate to 
reading graphs. 

Data Reduction and the Structure of Graphs . 

The process of data reduction and the structure of graphs are factors that influence graphicacy. 
Data reduction is the transition from tabular and graphical representations which display raw data to 
those which present grouped data. Different graphical representations of numerical data reflect 
different levels of data reduction. A graph may display the original raw data or a graph may display 
grouped data. For example, line plots and stem plots use tallied data, an initial form of data 
reduction; it is possible to identify the original data values from either of these representations. 
Boxplots and histograms are representations of grouped data at a more advanced level of data 
reduction; no longer is it possible to identify individual data values from these graphs. Most 
graphical representations used in the early grades (e.g., picture graphs, bar graphs) involve either 
just the original data or tallied data from which the original observations may be obtained. Students 
in upper grades often use graphical representations of grouped data (histograms, boxplots) from 
which it is usually not possible to return to the data in its original form. 

The structure of graphical representations of data may also impact understanding. For 
example, graphical representations utilize one axis or two axes or, in some cases, may not have an 
axis. For graphical representations that use both axes, the axes may have different meanings. In 
some simple graphs, the vertical axis may display the value for each observation while the vertical 
axis for more typical bar graphs and histograms provides the frequency of occurrence of each 
observation (or group of observations) displayed on the horizontal axis. Confusion may develop if 
the different functions of the x- and y-axes across these graphs are not explicitly recognized. 
Misunderstandings attributable to graph structure with respect to other kinds of graphs also can be 
identified. 

Nature of questions asked . 

As Wainer (1992) notes, the rudiments of a theory of graphicacy need to address the broader 
issue of what kinds of questions graphs can be used to answer. Both Curcio ( 1987) and Wainer 
(1992) have characterized the kinds of questions that address different levels with respect to 
reading graphs (see Figure 1). Three levels of graphicacy emerge: an elementary level which 
focuses on extracting data from the graph, an intermediate level that involves interpolating and 
finding relationships in the data as shown on the graph, and an overall level that involves 
extrapolating from the data and interpretation of the relationships identified from the graph. At this 
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third level, questions provoke demonstration of an understanding of the deep structure of the data 
being presented in their totality through the graph. 

Insert Figure 1 about here 



Method 

During fall, 1994, we conducted a study of the ways that students in grades 6 and 8 made 
sense of information presented through graphical representations and made connections between 
related pairs of graphs representations (i.e., line plots, bar graphs, stem-and-leaf plots, and 
histograms). Students were tested both before and after an instructional unit (Friel & Bright, 1995- 
b) developed specifically to highlight a particular sequence of graphs that took into consideration 
increasing degrees of data reduction and building connections between pairs of graphs. Figure 2 
shows both the selected graphs and the questions asked as part of the written pre and post 
instruments; those questions that are highlighted will be considered in this report. Small samples 
from each grade were also interviewed before and after the unit. 



Insert Figure 2 about here 



The data analyzed and reported here were collected from a group of 76 sixth-grade students 
who were in three different mathematics classes in a middle school located in central North 
Carolina; they all were taught by the same teacher. They had had little experience with statistics 
prior to this study. Their teacher was part of the Teach-Stat Project^ and was a statistics educator. 
The study was conducted over a six-week period from mid-October to the end of November, 

1994. 

This study was not designed to assess a particular instructional model or curriculum. Rather, 
the authors reasoned that taking a “snapshot” of what students knew about representations at one 
point in time would not be as productive as trying to assess what students knew both before and 
after having an opportunity to gain experience with the process of statistical investigation and with 
some of the key concepts (including graphs) in statistics. Many prior studies of statistics seem not 
to have not addressed the question of change as it may be related to statistics learning because most 
students have had few relevant learning experiences in this area. 



' The Teach-Stat Project was a three-year teacher enhancement program that prepared over 450 K-6 teachers to 
teach statistics. Year 1 and Year 2 teachers (300 teachers) participated in 3-weck summer institutes. From the 
Years 1 and 2 teachers, 84 teachers were selected to receive additional professional development to prepare them 
to be statistics educators who could help support and train other teachers. See Friel, et al, (1996) and Gleason, ct 
al, ( 1 996) for additional information. 
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The written instrument was designed so that each set of problems was presented within a 
specific context that would be meaningful to the students (see Figure 3). The questions were 
sequenced in an order that addressed Curcio’s (1987) and Wainer’s (1992) three levels of 
questioning (see Figure 2), moving from read the data to read beyond the data. This was done to 
help students focus on the problem context and gain familiarity with the graphical representation. 



Insert Figure 3 about here 



The first two parts of the written instrument were administered both as a pre- and a post-test; 
the authors reasoned that line plots and bar graphs were graphs with which students have had some 
exposure. The second two parts (stem plots and histograms) were administered only as post-tests 
because students have had few experiences with either of these representations (verified through 
an earlier pilot test). The authors did not want to force students to try to interpret data using 
representations with which they were unfamiliar, believing that such a situation may unwittingly 
motivate misunderstandings. 

For each question on the pre and post written instruments, the analysis involved categorizing 
responses in ways that characterized the nature of students’ thinking. In this paper, for each of 
three different questions taken from the written instrument, we report the categorizations of 
responses followed by discussion of the results. Conclusions address observations made with 
respect to student responses across the three questions considered. 

Results and Discussion 
Part I - Line Plot 

Part I (see Figure 2 and 3) of the written instrument focused on questions related to a line plot 
that showed data presented in the a specific context of numbers of raisins found in half-ounce 
boxes of raisins. The question considered here is an intermediate level question labeled using 
Curcio’s (1987) category of “read between the data’’: Are there the same number of raisins in each 
box? How can you tell? 

Results: The data were analyzed by grouping responses to the question into categories used to 
reason about the question in relation to the data and to the features of the graph. 

• properties of the graph (considers both range of data and frequency) 

No, because the x’s are not all on one number. 

No, because the X’s show how many boxes had that many of raisins. Like 28 had 6 and 
29 had 3. 

No, because they are different numbers along the bottom. The X shows how many 
students that found that number. 

No. If there were the same number in each box there would be X’s all above the same 
number. 

• literally “reading” the data from the graph. 
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No there aren’t the same number of rasins in each box, I found my answer by looking at 
the data, 6 boxes have 28, 3 have 29, 4 have 30, 3 have 3.1, 1 has 32, 2 have 34, 6 
have 35, 1 has 36, 3 have 38, and 1 has 40. 

No, the data is all scattered out. There were 6 boxes with 28 raisins in them, 3 with 29 
raisins in them, 4 with 30 raisins in them, 3 with 31 raisins in the, 1 with 32 raisins in 
them, 0 boxes with 33 in them and so on. 

• properties related to the context or to the data. 

No, because they weigh the boxes until they equal 1/2 ounce. They don’t count the raisins. 

No Because some raisins can be smaller and that means you can have more. 

• range of the data (considers only range and does not include frequency) 

because it says the number of raisins goes from 26 to 40. 

No, there are not. All you have to do is look at the numbers on the bottom and it tells you 
how many raisans were in each box. 

• frequency of occurrence/height of bars 

No, the X’s have different numbers, so there are different numbers of X’s 
in each box. 

No, Because some do not have as much X’s and some have more. 

No, only two because they are not the same height. 

No. If it was it would be evan across all the same number. 

No, because they Do not have the same number of X’s. 

No. Because there isn’t the same number of X’s above each number. 

• other (includes incomplete, unclear, incorrect, or not statistically-reasoned responses) 

No, the graph shows that there are different numbers of raisins in each box 

No there are not. They all have different amounts. 

No! Because you can look at them and tell that all of them are not the same because each 
one has a different number in each one of them 

No, I read the information on the line plot 



The matrix (in Figure 4) provides the relative frequency of occurrence for each category for 
pre- and post-test responses. 



Insert Figure 4 about here 



There are a number of different observations that may be made about these data: 

• A little over one-fourth of the students offered explanations for their responses that indicated 
that they had an understanding of the role of both the data displayed on the axis and the 
frequencies noted by the X’s displayed above the axis. 

• A large number of students on both the pre-test (45%) and the post-test (38%) offered 
explanations that were coded as “other”, meaning that they were not judged acceptable in terms 
providing appropriate reasoning for the answer given. 

• On the pre- and post-tests, similar percentages (17% - 18%) of students indicated in their 
explanations that their focus was the frequency or the numbers of X’s; these students seemed 
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to be saying or actually did say that to have the same number of raisins in each box meant that 
the columns, of X’ s needed to be the same heights. 

• 22% of the students’ reasoning appeared to be stable across time, i.e., students who used 

properties of the graph (15%), range of the data (3%), and frequency/height of the bars (4%) 
Discussion : Overall, in this problem, a limited number of students (roughly 28% pre/post) 
were able to reason using information about the data values themselves (from the axis) and the 
frequencies of occurrence of these data values (the X’s). The number of students who seemed to 
focus on the frequency or number of X’s as the data values indicates that there may well be 
confusion even when using line plots about the role of data values and frequencies. We have found 
such confusions exist with students’ reading of bar graphs; we attribute some of these confusions 
to having to read the frequency using the vertical axis. Here, this is not the case. 

This question was the first that students’ addressed in this portion of the written test. A “read 
the data” question could have been, “How many raisins are in the smallest box?” or “How many 
boxes of raisins had 30 raisins in them?” We chose to move directly to the “read between the data” 
question because we believed that the “read the data” questions would not show much diversity in 
response (earlier pilot testing had substantiated this hypothesis). However, given the results, one 
wonders if the “read the data” questions might have served as a way of clarifying the structure of 
the graph for students prior to having them move onto the “read between the data” and “read 
beyond the data questions”. 

What is interesting about this problem is that students (with the exception of 2 out of the 76) 
answered the question correctly, i.e., no, there are not the same number of raisins in each box. 
However, once an explanation was given, it is clear that many students looked at this graph in 
ways that provided incorrect reasoning for their answer. In addition, a large number of students 
provided vague or incomplete responses that seem to say “you know...the graph says this!” Part of 
this may result from our own lack of clarity on how we expect students to be able to talk about 
graphs. Still another part of this may reflect the usual emphasis in mathematics on “getting an 
answer” and seldom on the follow-up questioning that insists on clear explanations from students 
of why for the answers are given. 

Part III - Stem Plot 

Part III (see Figure 2 and 3) of the written instmment focused on questions related to a stem- 
and-leaf plot that showed data presented in the a specific context of time it takes for students to 
travel to school. The question considered here is an overall level question labeled using Curcio’s 
(1987) category of “read beyond the data”: What is the typical time it takes for students to travel to 
school? Explain your answer. 

Results: The data were analyzed by grouping responses to the question into categories used to 
reason about the question in relation to the data and to the features of the graph. 
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• Responses that identified the mode in the data: 

23 minutes, their are more three’s on the 20 column than any where else 

23 minutes. In each row you will find different well I happen to look at the 20’ s section 

and there was a tripple number and that is how I go may answer. 

23 minutes, looked at the graph and 23 minutes was most common 

• Responses that identified a cluster of times which can be characterized as modal interval . 
20 to 28 minutes because thats how long it takes most of the kids 

20 to 28 minutes I looked at the numbers that has the most amount. 

The typical is any where from 20-28 because that is where most people are. 

20 to 30 minutes, because I looked at the one with the most [mislabels upper bound of 
stem^ 

• Responses that identified the mode of the leaves: 

3 minutes because 3 appears more frequently [misses “5 ” which occurs with same 
frequency] 

5 and 3 

3 min. It takes 3 min to get to school. That is the typical time 

• Responses that identified the median: 

18 min. I counted the number of students then devide the number of students and then I 
counted over than when I got to the number that I has I stop and that was my answer. 
[miscalculates median] 

18 1/2 min. It is the median, the middle piece of the data. 

The typical time it takes to travel to school is 18 and a half min. because it sit he median 
number. 

• Responses that identified the median of the leaves: 

8 1/2 minutes is the typical time it takes students to the from the home to the school 
building. 

• other (includes incomplete, unclear, incorrect responses) 

32. Because more people have 32 min. [Student reads tens from leaves and ones from stem 
- reverses process] 

It takes a student from 20 to 23 min. you just have to look at the stem and leaf plot to figure 
it out. 

35 - 45 

Figure 5 provides the relative frequency of occurrence of each category for post-test 
responses. (Recall that stem plots were assessed only as part of the post-test.) There are a number 
of different observations that may be made about these data. 



Insert Figure 5 about here 



• 75% of the students responding used mode or modal interval as a way of describing the typical 

time it takes to travel to school. Explanations for their reasoning indicated that students 
understood how to read stem plots. 
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• Appoximately 1 % of the students used the median as a way of describing the typical time it 
takes to travel to school. Explanations for their reasoning indicated that students understood 
how to read stem plots. 

• Appoximately 6% of the students identified either the mode of the leaves, naming either both 
choices (3 and 5) or only one of these choices or the median of the leaves. They appear not to 
have considered the role of the stem in naming the actual times. 

• A little more than 10% of the students made some type of error in reading the graph and/or 
answering the question so their responses were judged as incorrect. On student actually read 
the stem plot “backwards”, reading the leaf as the tens digit and the stem as the ones digit. 
Discussion: Stem plots are fairly recent additions to the repertoire of graphical representations. 

Stems usually represent the tens digits of the data values and leaves usually represent the ones 
digits. This means that the "20s stem" is a row in which data values from 20 to 29 can be placed. 
However, if 27 is the largest value less than 30, 27 will be the largest value that is actually 
represented in the 20s stem. A careful reading of data displayed in a stem plot is required since 
some confusions are possible due solely to the visual elements of the stem plot. One example is the 
student who identified 32 as the typical time (response categorized as incorrect); this demonstrates 
a rather fundamental misreading of the stem plot. As another example, a number of students 
selected the modal interval as a way to cluster data and describe the typical time. However, if the 
stem plot was restructured to display data in intervals of 5, an obvious cluster can be identified 
from 15 to 28 minutes (using intervals of 15-19, 20-14, 25-29). 

Even though students had received instmction involving three measures of center (mode, 
mean, and median), mode was the most frequent strategy for describing typical time. Given that 
the stem plot organizes data in apparent clusters, one might have anticipated that more students 
would have used the strategy of “modal interval”. Rather students seemed to ignore features of the 
graph and focus on individual data items without much thought as to reasonableness of such a 
strategy for explaining what is typical in this kind of data situation. 

Part IV - Histogram 

Part rv (see Figure 2) of the written instrument focused on questions related to a histogram 
that showed data presented in the a specific context of allowances of a group of 60 students. 

The question considered here is an overall level question labeled using Curcio’s (1987) 
category of “read beyond the data”: Another student's parents have agreed to look at data about 
kids allowances. Then they will decide what allowance to give the student. Using the 
histogram showing this information for 60 students, what allowance do you think the student 
should make a case for? Why? 
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Results: The data were analyzed by grouping responses to the question into categories used to 
reason about the question in relation to the data and to the features of the graph. These categories 

label reasoning strategies that are based on 

• Responses that identified “modal intervals”: 

a. By naming only the lower bounds of each interval: 

$2.00 or $5.00 that is what most of the kids in the class have. 

I think they should get $2.00 or $5.00 because more student allowances is $2.00 to 
$5.00. 

b . By naming only the upper bounds of each interval: 

He should get either $2.50 or $5.50. He should get this because the mode is these two 
numbers. 

The student should make a case for the typical amount - 2.50, 5.50 because most 
people have that much 

c. By naming the intervals: 

They should make a case for $2 to $2.50 or $5 to $5.50. 1 think this is because it is the 
most popular. 

$2.00 to $2.50 or $5.00 to $5.50. 1 think this because most of the students that are 
represented gets this ammount for their allowances. 

• Responses that identified the greater (in value) “modal interval”: 

a. By naming only the lower bound of the interval: 

$5.00 is the typical amount. 

$5.00 has the most students on it. 

$5.00 dallor I think $5.00 is for the student that when they get bigger then give them 
some more. 

b. By naming only the upper bound of the interval: 

I think he should make a case for $5.50. 1 think this because it is the highest mode. His 
dad might let him since it is typical. 

c . By naming the interval: 

5.00 to 5.50, the most people and pretty much money 

Between $5.00 and $5.50. 1 say that because a lot of people get that amount. Between 
$5.00 and $5.50 ties with $2.()0-$2.50 but I give him the benefit. 

The student should get between $5.00 and $5.49. It is the mode. 

• Responses that identified the smaller (in value) “modal interval”: 

a. By naming only the lower bound of the interval: 

$2.00 it kind of low and thats what you should get. 

$2.00 because most kinds get about that much. 

b. By naming only the upper bound of the interval: 

$2.50 because the most kids get that amount. 

They should make a case for $2.50 because there is nine people that receive that much. 
$2.50 Because $2.50 appears most frequently. 

c. By naming the interval: 

I think $2.00 - $2.50 because most people have it and it is resamble price to pay. 

$2.00 $2.50. That is the typical amount of money somebody gets. 

• Responses that identified “the middle of the data”: 

$3.00 to $3.49 because it is the median and $3.00 dollars is plenty but not to much. 

$3.25. Because it is the middle number of data [Middle of interval $3.00 - $3.50] 

$4.50 because it is the avarge number it’s in the middle. . 

$3.50 because it is the very middle amount of money (allowence.) 

I think the student should make a case for [$3.00 to] $3.50 because it is right in the middle 
of the two averages. 




9 



iO 



4/3/96 



Paper Presented at the Annual Meeting of AERA - 1996, New York 



$3.75 Because it is the middle number of the data. Also this is a resinable allowance. 

[Note: In the middle of the x-axis scale if exclude $9.50 - $9.99] 

I think he should make a $3.75 because that numbers between $2.00 and $5.50. 

$5.00 because it goes $0.00 to $10.00 dollars and $5.00 dollars is half of $10.00. 

Probably about $5.25 since most people have an allowance in that area. $2-$2.50 seems 
like not enough. 

I think that they should have 5.00-5.50 because thats in the middle. 

• Responses that identified clusters in the data 
2.00 to 5.50 more people get that allowance 

Between $4.00 and $5.50 because most children have that allowence. 

• other (evaluated as incorrect) 

a. Unclear 

He sould go for 5.50 because that person got paid more [Not clear how student is 
reading graph; note the reference to “that person may be viewing each bar as 
representing a student???] 

The students that get $0.00 - $0.50, because it could get them to get more allowance 
then they get. 

b. Reading the y-axis 

$2.50, $3.00, $3.50, $5.50, 6.50, $7.00, 9.50 [Looks like this student was reading 
data from the y-axis - notice that each of these is the “lower” bound of an interval in 
which there is only one student] 

1. because more students have 1. [Looks like this student was reading data from the y- 
ax/s] 

1 because that is the least [Looks like this student was reading data from the y-axis] 

c. Personal judgment^reference 

$3.00 because that’s all they should need. 

$20.00 to $25.00, because that is what I get. 

$7 dollars a week. So that if they want something nice it won’t take them long term to 
earn the money they need for something. 

I think the student she make a case about the student who got from $9.50 to $10.00, 
because this student might get the same amount. 

Figure 6 provides the relative frequency of occurrence each category for post-test responses 
(Recall that histograms were assessed only as part of the post-test.) There are a number of different 
observations that may be made about these data. 



Insert Figure 6 about here 



• Almost 60% of the students used some form of identifying modal intervals as a way to describe 
the typical allowance. A little more than 30% of the students highlighted the greater mode when 
creating arguments for which allowance should be paid while about 13% highlighted the lower 
mode. The remaining 17% acknowledged the presence of both modal intervals but, often, they 
did not take a stand with respect on how to use this fact to make an argument for an allowance. 
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• About 10% of the students sought to describe the middle of the data in some way, trying to 
locate the median was one type of middle. Others involved use of the x-axis in seeking to 
identify “middle” allowances. 

• A small number of students (4%) explored the use of clusters as a way to identify a typical 
allowance. 

• A little more than 25% of the students gave an incorrect answer to the question or did not 
answer the question. 

Discussion A histogram represents data which have been grouped in intervals. In the 
instructional unit, students were told about the convent! .as of histograms; that is, the bars touch, 
each bar represents data in an interval, an interval starts with the lower bound but does not include 
the upper bound (e.g., the interval 5-10 includes all data from 5 up to and not including 10), etc. 
This information was not, however, the central focus of the unit. 

One component related to the structure of the graph that surfaces in student responses is how 
they chose to refer to intervals on the graph; very few students addressed intervals as “$5.00 to 
$5.49”. Rather students named intervals as “$5.00 to $5.50”. This means that, if they identified 
$5.50 as a good allowance because lots of students had this allowance, they were thinking of it as 
included in the interval “$5.00 to $5.49” because of the way the x-axis is labeled. However, we do 
have other evidence that indicates that, for other kinds of questions, they correctly interpret the 
boundaries of an interval. For purposes here, we have considered “$5.50” as a correct response 
for purposes of characterizing students’ thinking about the question in light of the graph rather than 
their explicit understanding of the graph structure. 

Rather than referring to intervals, students seemed to focus on identifying single dollar 
amounts to describe allowances. These amounts came by naming either the lower value in the class 
width of the interval (e.g., $2.00 or $5.(X)) or the greater value in the interval (e.g., $2.50 or 
$5.50). 

Finding the middle of the data is more problematic with respect to data displayed on the 
histogram. The median actually falls between the two intervals: $3.50 -$3.99 and $4.00 - $4.49. 
It’s not clear how students who identified a median reasoned through their answers. One strategy 
would involve locating the intervals in which the 30th and 31st data values occurred; this interval 
could be named the “median interval”. It is likely that students had little, if any, experience in their 
instruction in thinking bout the median as it relates to location in a histogram. 

Three of the students responding incorrectly to this question seemed to focus on the y-axis, 
noting that there were more “bars” of height 1 than any other bars and so either these values or the 
value of “ 1” should be used to determine an appropriate allowance. 
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Conclusions 

Our analyses focused on a descriptive summary of ways to categorize students’ responses to 

the different levels of questions posed. Observations can be made across questions: 

• In both the line plot and the histogram, we see students continuing to confuse the axes of the 
graphs and using either the number of Xs or the y-axis as a source of information about the 
values of data rather than the frequencies occurrence of data values. 

• Using intervals of data (in stem plots and histograms) may be problematic for students. This 
highlights the increasing abstraction related to the process of data reduction. As data values 
become further grouped within intervals, students attempt to use strategies such as finding the 
mode or median but think about the data as individual data values rather than re-structuring 
their thinking to consider data grouped in intervals. 

• The use of “middle” of the data as a way to describe what is typical is used much less 
frequently than is the mode. With the stem plot, students appeared to have strategies that 
worked for finding the median data value, most likely because the individual data values could 
be identified. With the histogram, the median, as an individual data value, could not be 
identified. It appears that students sought to cluster the data and to locate a middle within a 
cluster by using the values listed on the x-axis. 

• If we consider measures of center and their relationship to graphs, the mode is easily identified 
when data are represented in graphical form. However, the other two measures of center - 
mean and median - are not readily identified from the graph although it may be possible to help 
students estimate the location of each when data are represented in graphical form. It may be 
that students are using measures of center to help them describe what is “typical” in a set of 
data and, because the mode is easily identified from the graph, are relying on it as a descriptor 
when they are using a graph to answer this question. 

Limitations 

There are number of questions that arise with respect to this study. For example: 

1 . How did the questions (wording, level, type of data, type of graph) influence the categorization 
of student responses? Other studies that look at the effects of different questions for a single set 
of data and a single graph may be warranted. 

2 . How did the visual features of the graphs influence the categorization of students responses? 
For example, in the first question dealing with the line plot, the labeling of the x-axis may have 
influenced the ways that students responded and, consequently, the ways we categorized their 
responses. Similarly, the leaves on the stem plot seemed to be important visual cues to the 
students; in the histogram, the combination of labels on the x-axis and the bars representing 
intervals of data seemed also to be important cues. 
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3 . Students interpretation of the word typical and the interaction of this word with the graph also 
need to be considered. Further, asking “what is the typical time?” and asking “what allowance 
do you thin the student should make a case for?” may be intended to elicit similar strategies 
such as using measures of center but actually may be viewed as very different kinds of 
questions. 

There may need to be additional studies to look at both visual and wording effects more 
systematically. 

Fundamental to graphicacy are the broader issues of what kinds of questions graphs may be 
used to answer. We have found that by exploring learner’s responses to different kinds of 
questions we gain some knowledge about learners’ thinking. We need to be clear about what 
attributes of statistical thinking we want to promote and about ways to promote these attributes. 
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Figure 1: Levels of Questions - Ability to Read Graphs 



Curcio, liS? 


Wainer, 190i 


Reading the data - involves “lifting” the 
information from the graph to answer explicit 
questions for which the obvious answer is 
right there in the graph, for example, “How 
many students have 12 letters in their 
names?” 


Elementary level questions that involve data 
extraction, for example, ““What percent of 
the cats admitted to the animal shelter in 
June, 1990 were kittens?” 


Reading between the data includes the 
interpretation and integration of information 
that is presented in a graph. Questions involve 
at least one step of logical or pragmatic 
inferring in order to get from the question to 
the answer, for example, “How many 
students have more than 12 letters in their 
names?” 


Intermediate level questions that involve 
identifying trends seen in parts of the data, 
for example, “Between January and June, 
how does the percent of kittens admitted to 
the animal shelter change?” 


Reading beyond the data involves extending, 
predicting, or inferring from the 
representation to answer implicit questions. 
The reader gives an answer that requires prior 
knowledge about a question that is at least 
related to the graph, for example, “If a new 
student joined our class, how many letters 
would you predict that student would have in 
her name?” or “Looking at the data about 
name lengths from several different classes of 
students, what kinds of patterns do you 
observe in name lengths across the classes?” 


Overall level of questions that involve an 
understanding of the deep structure of the 
data being presented in their totality, usually 
comparing trends and seeing groups, for 
example, “Using the data from 1990, 
which month in 1991 do you predict will 
show the most dramatic increase in the 
number of kittens admitted to the animal 
shelter?” or “Looking at the graph that 
shows percents of cats, kittens, dogs, and 
puppies admitted to the animal shelter over 
the 12 months in 1990, which of the four 
categories of animals show the same 
pattern over the twelve months?” 
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Note: On this written test, a specific question related to “reading the data’* was not asked; the question shown is a sample of the kind of question that can be asked. 
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Figure 3 

Part I: Raisins 

Students brought several different foods to school for snacks. One snack that lots of them 
like is raisins. They decided they wanted to find out just how many raisins are in 1/2 ounce 
boxes of raisins. They wondered if there was the same numbers of raisins in every box. 
The next day for snacks they each brought a small box of raisins. They opened their boxes 
and counted the number of raisins in each of their boxes. 

For the line plot showing the information they found see Figure 2. 



Part II; Lengths of Cats 

A group of students has been investigating information about their pets Several students 
have cats. They decided to collect some information about each of the cats. One set of data 
they collected was the length of the cats measuring from the tips of the cats' noses to the 
tips of their tails. 

For the bar graph showing the information they found see Figure 2 



Part III: Travel Time to School 



Students were interested in how they used their time. They brainstormed a list of ways 
such as sleeping, eating, after school sports, and so on. Jim reminded them that some of 
their time is used just traveling back and forth to school. Some of the students thought this 
shouldn't count because it really wasn't much time at all. Others disagreed. The class 
wondered, "What is the typical time it takes to travel to school?" 

For the stem and leaf plot showing the information they found see Figure 2. 



Part IV: Allowances^ 



Sometimes students have to make a convincing argument for an allowance. The histogram 
shows allowances for 60 Students. The first bar shows that five students received an 
allowance that was less than $.50. So one of those five students might receive no 
allowance and another student might receive $.35 but no student in this group received an 
allowance of $.50. The second bar shows that three students received an allowance that 
was at least $.50 but less than a $1.00. So this means that one of the three students might 
receive an allowance of $.50 or of $.75 but no student in this group received an allowance 
of $1.00. 

For the histogram showing the information they found see Figure 2. 



^ Task adapted from Mokros & Russell, 1995. 
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Figure 4: Percent Category Responses (N=76) - Raisins 



Category (Post =>) 
(Preil) 


1 


2 


3 


4 


5 


6 


% 

rOTAL 

(Pre) 


1. properties of the 
graph 


14.5 






7.9 


3.9 


1.3 


27.6 


2. reading the graph 
















3. properties of the 
context/data 












1.3 


1.3 


4. range of the data 


2.6 


1.3 




2.6 


1.3 


1.3 


9.2 


5. frequency/ 
height of bars 


7.9 








3.9 


5.2 


17.1 


6. other (incorrect) 


3.9 


2.6 


2.6 


3.9 


9.2 


22.3 


44.7 


% TOTAL (Post) 


28.9 


3.9 


2.6 


7.9 


18.4' 


1 38.1 


[ 100.0 



Figure 5: Percent Category Responses (N = 76) - Travei Time 



Category 


% TOTAL 


1. mode 


68.4 


2. modal interval 


6.6 


3. mode of the leaves 


5.2 


4. median 


6.6 


5. median of leaves 


1.4 


6. other (incorrect) 


11.8 


% TOTAL (Post) 1 


100.0 



Figure 6: Percent Category Responses (N = 76) - Aiiowance 



Category 


% TOTAL 


1. modal intervals 


17.1 


2. greater modal interval 


28.9 


3. lower modal interval 


13.2 


4. middle of the data 


10.5 


5. clusters 


4.0 


6. other (incorrect) 


17.1 


7 . Did not answer 


9.2 


% TOTAL (Post) 


100.0 
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